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Abstract 

Q_J We study a family of combinatorial optimization problems defined by a parameter 

Q p G [0,1], which involves spectral functions applied to positive semidefinite matrices, 

and has some application in the theory of optimal experimental design. This family of 
problems tends to a generalization of the classical maximum coverage problem as p goes 
to 0, and to a trivial instance of the knapsack problem as p goes to 1. 

In this article, we establish a matrix inequality which shows that the objective function 
is submodular for all p G [0, 1], from which it follows that the greedy approach, which has 
^ often been used for this problem, always gives a design within 1 — 1/e of the optimum. We 

CN next study the design found by rounding the solution of the continuous relaxed problem, 

an approach which has been applied by several authors. We prove an inequality which 
generalizes a classical result from the theory of optimal designs, and allows us to give a 
rounding procedure with an approximation factor which tends to 1 as p goes to 1. 
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c5 1 Introduction 

This work is motivated by a generalization of the classical maximum coverage problem which 
arises in the study of optimal experimental designs. This problem may be formally defined 
as follows: given s positive semidefinite matrices Mi, . . . , Ms of the same size and an integer 
N < s, solve: 



max rank ( 2, ^i ) (-Po 

s. t. card(/) < A^, 
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where we use the standard notation [s] :={!,..., s} and card(5') denotes the cardinahty of S. 
When each Mj is diagonal, it is easy to see that Problem (Pq) is equivalent to a max-coverage 
instance, by defining the sets Si = {k : {Mi)k,k > 0}, so that the rank in the objective of 
Problem (Pq) is equal to card ( Ujg/ Sij . 

A more general class of problems arising in the study of optimal experimental designs is 
obtained by considering a deformation of the rank which is defined through a spectral function. 
Given p E [0, 1], solve: 

max yDp (n) (Pp) 

s.t. ^ni<N, 

i€[s] 

where (^p(n) is the sum of the eigenvalues of J2ie\s] '^i^i raised to the exponent p: if the eigenval- 
ues of the positive semidefinite matrix Xliefsi "^i^i ^"^^ ^i, . . . , A^ (counted with multiplicities), 
^p{n) is defined by 



(/}p(n) = trace f ^njMjj =^A^. 



JG[s] k=l 

We shall see that Problem (Pq) is the limit of Problem (P^) as p — )■ O"*" indeed. On the other 
hand, the limit of Problem (Pp) as p — )■ 1 is a knapsack problem (in fact, it is the trivial instance 
in which the i^^ item has weight 1 and utility Ui = trace Mj). Note that a matrix Mi may be 
chosen rii times in Problem (Pp), while choosing a matrix more than once in Problem (Pq) 
cannot increase the rank. Therefore we also define the binary variant of Problem (Pp): 

max l^p{n): ne {0, l}^ J] n, < iV I (P^-) 

[ ieW J 

We shall also consider the case in which the selection of the i^^ matrix costs q, and a total 
budget B is allowed. This is the budgeted version of the problem: 

max I if pin): neW, ^ CiUi < B \ [P^"^^) 

i ieW J 

Throughout this article, we use the term design for the variable n = (ni,...,ns) G N'*. 
We say that n is a N— replicated design if it is feasible for Problem {Pp)., a N— binary design 
if n is feasible for Problem (P^"^), and a B— budgeted design when it satisfies the constraints 
of (Ppb^s). 

1.1 Motivation: optimal experimental design 

The theory of optimal design of experiments plays a central role in statistics. It studies how to 
best select experiments in order to estimate a set of parameters. Under classical assumptions, 
the best linear unbiased estimator is given by least square theory, and lies within confidence 
ellipsoids which are described by a positive semidefinite matrix depending only on the selected 



experiments. The optimal design of experiments aims at selecting the experiments in order to 
make these confidence elhpsoids as small as possible, which leads to more accurate estimators. 

A common approach consists in minimizing a scalar function measuring these ellipsoids, 
where the function is taken from the class of $p- information functions proposed by Kiefer [Kie75]. 
This leads to a combinatorial optimization problem (decide how many times each experiment 
should be performed) involving a spectral function which is applied to the information ma- 
trix of the experiments. For p g]0, 1], the Kiefer's $p-optimal design problem is equivalent to 
Problem [Pp) (up to the exponent 1/p in the objective function). 

In fact, little attention has been given to the combinatorial aspects of Problem (Pp) in 
the optimal experimental design literature. The reason is that there is a natural relaxation 
of the problem which is much more tractable and usually yields very good results: instead of 
determining the exact number of times Ui that each experiment will be selected, the optimization 
is done over the fractions Wi = Ui/N e [0, 1], which reduces the problem to the maximization of 
a concave function over a convex set (this is the theory of approximate optimal designs). For the 
common case, in which the number N of experiments to perform is large and N > s (where s is 
the number of available experiments), this approach is justified by a result of Pukelsheim and 
Rieder [PR92], who give a rounding procedure to transform an optimal approximate design w* 
into an iV— replicated design n = (ni, . . . , rig) which approximates the optimum of the Kiefer's 
$p— optimal design problem within a factor 1 — -^. 

The present developments were motivated by a joint work with Bouhtou and 
Gaubert [BGS08, SGBIO] on the application of optimal experimental design methods to the 
identification of the traffic in an Internet backbone. This problem describes an underinstru- 
mented situation., in which a small number N < s oi experiments should be selected. In this 
case, the combinatorial aspects of Problem (Pp) become crucial. A similar problem was studied 
by Song, Qiu and Zhang [SQZ06], who proposed to use a greedy algorithm to approximate 
the solution of Problem {Pj^. In this paper, we give an approximation bound which justifies 
this approach. Another question addressed in this manuscript is whether it is appropriate to 
take roundings of (continuous) approximate designs in the underinstrumented situation (re- 
call that this is the common approach when dealing with experimental design problems in the 
overinstrumented case, where the number A^ of experiments is large when compared to s). 

Appendix A is devoted to the application to the theory of optimal experimental designs; we 
explain how a statistical problem (choose which experiments to conduct in order to estimate a 
set of parameters) leads to the study of Problem (Pp), with a particular focus to the underin- 
strumented situation described above. For more details on the subject, the reader is referred 
to the monographs of Fedorov [Fed72] and Pukelsheim [Puk93] . 

1.2 Organisation and contribution of this article 

The objective of this article is to study some approximation algorithms for the class of 
problems (Pp)pg [o,i]. Several results presented in this article were already announced in the 
companion papers [BGS08, BGSIO], without the proofs. This paper provides all the proofs of 
the results of [BGSIO] and gives new results for the rounding algorithms. We shall now present 
the contribution and the organisation of this article. 

In Section 2, we establish a matrix inequality (Proposition 2.3) which shows that a class 
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Table 1: Summary of the approximation bounds obtained in this paper, as well as the bound of 
Pukelsheim and Rieder [PR92]. The column "Reference" indicates the number of the theorem, 
proposition or remark where the bound is proved (a citation in parenthesis means a direct 
application of a result of the cited paper, which is possible thanks to the submodularity of ifp 
proved in Corollary 2.5). In the table, posterior denotes a bound which depends on the contin- 
uous solution w* of the relaxed problem, while a prior bound depends only on the parameters 
of the problem. 



of spectral functions is submodular (Corollary 2.4). As a particular case of the latter result, 
the objective function of Problem (Pp) is submodular for all p G [0, 1]. The submodularity of 
this class of spectral functions is an original contribution of this article for < p < 1, however 
a particular case of this result was announced -without a proof- in the companion paper on 
the telecom application [BGS08]. In the limit case p = 0, we obtain two functions which were 
already known to be submodular (the rank and the log of determinant of a sum of matrices). 
Due to a celebrated result of Nemhauser, Wolsey and Fisher [NWF78] , the submodularity 
of the criterion implies that the greedy approach, which has often been used for this problem, 
always gives a design within 1 — e~^ of the optimum (Theorem 2.6). We point out that the 



submodularity of the determinant criterion was noticed earlier in the optimal experimental 
design literature, but under an alternative form [RS89]: Robertazzi and Schwartz showed that 
the determinant of the inverse of a sum of matrices is supermodular, and they used it to write 
an algorithm for the construction of approximate designs (i.e. without integer variables) which 
is based on the accelerated greedy algorithm of Minoux [Min78]. In contrast, the originality of 
the present paper is to show that a whole class of criteria satisfies the submodularity property, 
and to study the consequences in terms of approximability of a combinatorial optimization 
problem. 

In Section 3, we investigate the legitimacy of using rounding algorithms to construct a 
iV— replicated design n = [rii, . . . , Ug) G N* or a A^-binary design n G {0, 1}* from an optimal 
approximate design w*, i.e. a solution of a continuous relaxation of Problem (Pp). We establish 
an inequality (Propositions 3.1 and 3.3) which bounds from below the approximation ratio of 
any integer design, by a function which depends on the continuous solution w*. Interestingly, 
this inequality generalizes a classical result from the theory of optimal designs (the upper 
bound on the weights of a D-optimal design [PukSO, HT09] is a particular case (p = 0) of 
Proposition 3.1). The proof of this result is presented in Appendix B ; it relies on matrix 
inequalities and several properties of the differentiation of a scalar function applied to symmetric 
matrices. Then we point out that the latter lower bound can be maximized by an incremental 
algorithm which is well known in the resource allocation community (Algorithm 3.1), and 
we derive approximation bounds for Problems (Pp) and (P^"') which do not depend on w* 
(Theorems 3.7 and 3.8). For the problem with replicated designs [Pp), the approximation factor 
is an increasing function of p which tends to 1 as p — )■ 1. In many cases, the approximation 
guarantee for designs obtained by rounding is better than the greedy approximation factor 
l-e-\ 

We have summarized in Table 1 the approximation results proved in this paper (this table 
also includes another known approximability result for Problem (Pp), the efficient apportion- 
ment rounding of Pukelsheim and Rieder [PR92]). 

2 Submodularity and Greedy approach 

In this section, we study the greedy algorithm for solving Problems (Pp) and (-Pp '") through the 
submodularity of (pp. We first recall a result presented in [BGS08], which states that the rank 
optimization problem is NP-hard, by a reduction from the Maximum Coverage problem. It 
follows that for all positive e, there is no polynomial-time algorithm which approximates (Pq) 
by a factor oi 1 — - + e unless P = NP (this has been proved by Feige for the Maximum 
Coverage problem [Fei98]). Nevertheless, we show that this bound is the worst possible ever, 
and that the greedy algorithm always attains it. 

To this end, we show that a class of spectral functions (which includes the objective function 
of Problem (Pp)) is nondecreasing submodular. The maximization of submodular functions over 
a matroid has been extensively studied [NWF78, CC84, CCPV07, Von08, KST09], and we shall 
use known approximability results. 

To study its approximability, we can think of Problem [Pp) as the maximization of a set 
function </?' : 2^ i— )► IR+. To this end, note that each design n can be seen as a subset of E, 
where ii^ is a pool which contains A^ copies of each experiment (this allows us to deal with 



replicated designs, i.e. with experiments that are conducted several times; if replication is not 
allowed (Problem (Pp'")), we simply set E := [s]). Now, if S" is a subset of E corresponding to 
the design n, we define ^p{S) := (Pp{n). In the sequel, we identify the set function ip'^ with ipp 
(i.e., we omit the prime). 

We also point out that multiplicative approximation factors for the $p— optimal problem 
cannot be considered when p < 0, since the criterion is identically as long as the the infor- 
mation matrix is singular. For p < indeed, the instances of the $p-optimal problem where 
no feasible design lets Mp{n) be of full rank have an optimal value of 0. For all the other in- 
stances, any polynomial-time algorithm with a positive approximation factor would necessarily 
return a design of full rank. Provided that P ^ NP, this would contradict the NP-hardness of 
Set- Cover (it is easy to see that Set Cover reduces to the problem of deciding whether there 
exists a set S of cardinal N such that Ylies ^« ^^^ ^^^^ rank for some diagonal matrices Mi, by 
a similar argument to the one given in the first paragraph of this article). Hence, we investigate 
approximation algorithms only in the case p G [0, 1]. 

2.1 A class of submodular spectral functions 

In this section, we are going to show that a class of spectral functions is submodular. We recall 
that a real valued function F : 2^ — )■ M, defined on every subset of E is called nondecreasing 
if for all subsets I and J oi E, I C J implies F{I) < F{J). We also give the definition of a 
submodular function: 

Definition 2.1 ( Submodular ity). A real valued set function F : 2^ — )■ M is submodular if it 
satisfies the following condition : 

F{I) + F( J) > F{I U J) + F{I n J) for all I,J CE. 

We next recall the definition of operator monotone functions. The latter are real valued 
functions applied to hermitian matrices: ii A = f/Diag(Ai, . . . , Xm)U* is a. m x m hermitian 
matrix (where U is unitary and U* is the conjugate of U), the matrix f{A) is defined as 
f/Diag(/(Ai),...,/(A„))f/*. 

Definition 2.2 (Operator monotonicity) . A real valued function / is operator monotone on 
]R+ (resp. M.*^) if for every pair of positive semidefinite (resp. positive definite) matrices A and 

A^B^ f{A) ^ f{B). 
We say that / is operator antitone if — / is operator monotone. 

The next proposition is a matrix inequality of independent interest; it will be useful to 
show that (pp is submodular. Interestingly, it can be seen as an extension of the Ando-Zhan 
Theorem [AZ99], which reads as follows: Let A, B be semidefinite positive matrices. For any 
unitarily invariant norm ||| ■ |||, and for every non-negative operator monotone function f on 

[0,oo), 

l\fiA + B)l\<\lfiA) + f{B)l 

Kosem [Kos06] asked whether it is possible to extend this inequality as follows: 

WfiA + B + C)l\< If {A + B) + f{B + C)- /(C)lll, 



and gave a counterexample involving the trace norm and the function f{x) = ^^. However, 
we show in next proposition that the previous inequality holds for the trace norm and every 
primitive / of an operator antitone function (in particular, for f{x) = x^, p g]0, 1]). Note that 
the previous inequality is not true for any unitarily invariant norm and f{x) = x^ either. It is 
easy to find counterexamples with the spectral radius norm. 

Proposition 2.3. Let f be a real function defined on IR+ and differentiable on M^. // /' is 
operator antitone on M^, then for all triples {X, Y, Z) of m x m positive semidefinite matrices, 

trace f{X + Y + Z)+ trace f{Z) < trace f{X + Z) + trace f{Y + Z). (1) 

Proof. Since the eigenvalues of a matrix are continuous functions of its entries, and since S^^ 
is dense in S+, it suffices to establish the inequahty when X, F, and Z are positive definite. 
Let X be an arbitrary positive definite matrix. We consider the map: 

T I — > trace /(X + T) - trace /(T). 
The inequality to be proved can be rewritten as 

i){Y + Z) <^{Z). 

We will prove this by showing that ip is nonincreasing with respect to the Lowner ordering in the 
direction generated by any positive semidefinite matrix. To this end, we compute the Frechet 
derivative of ^ at T G S^"*" in the direction of an arbitrary matrix H G S^. By definition, 

DiP{T){H) = lira -{iP{T + eH)-iP{T)). (2) 

When / is an analytic function, X \ — > trace f{X) is Frechet-differentiable, and an explicit 
form of the derivative is known (see [HP95, JB06]): D (trace f{A)^{B) = trace (/'(A)i?). Since 
/' is operator antitone on M^, a famous result of Lowner [Low34] tells us (in particular) that 
/' is analytic at all points of the positive real axis, and the same holds for /. Provided that 
the matrix T is positive definite (and hence X + T), we have 



D^iT){H) = trace( (/'(X + T) - nT))H^ . 



By antitonicity of /' we know that the matrix W = f'{X + T) — f'{T) is negative semidefinite. 
For a matrix if ^ 0, we have therefore: 

Dip{T){H) = trace (WH) < 0. 

Consider now h{s) := ip{sY + Z). For all s G [0, 1], we have 

h'{s) = DiPisY + Z){Y) <0, 

and so, h{l) = iIj{Y + Z) < h{0) = ip{Z), from which the desired inequality follows. D 
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Corollary 2.4. Let Mi, . . . ,Ms be m x m positive semidefinite matrices. If f satisfies the 
assumptions of Proposition 2.3, then the set function F : 2^ -^ M defined by 

\/Ic[s], F(/)= trace /(J^M^), 

is submodular. 

Proof Let /, J C 2^. We define 

X=Y,M,,Y=Y,M„Z=Y, Mi. 
iei\j ieJ\i i£inJ 

It is easy to check that 

F{I) = trace f{X + Z), 

F{J) = trace /(F + Z), 
F(/n J) = trace /(Z), 
F{I U J) = trace f{X + Y + Z). 

Hence, Proposition 2.3 proves the submodularity of F. D 

A consequence of the previous result is that the objective function of Problem (P^) is 
submodular. In the limit case p — > 0^, we find two well-known submodular functions: 

Corollary 2.5. Let Mi, ...,Ms be m x m positive semidefinite matrices. 

(i) Vp g]0, 1],/ H-). trace(^jgj- Mj)^ is submodular. 

(a) I \-^ rank(^jgj Mi) is submodular. 

If moreover every Mi is positive definite, then: 

(Hi) 1 1-> logdet(^^gjMj) is submodular. 

Proof. It is known that x H- x'^ is operator antitone on ]R*j_ for all q G [— 1,0[. Therefore, the 
derivative of the function x H- x^ (which is px^~^), is operator antitone on M^ for all p g]0, 1[. 
This proves the point [i) for p ^ 1. The case p = 1 is trivial, by linearity of the trace. 

The submodularity of the rank (ii) and of log det (Hi) are classic. Interestingly, they are 
obtained as the limit case of {i) as p — )■ 0^. (For log det, we must consider the second term in 
the asymptotic development oi X \-^ trace X^ as p tends to 0"*", cf. Equation (24)). D 

2.2 Greedy approximation 

We next present some consequences of the submodularity of ipp for the approximability of 
Problem (Pp). Note that the results of this section hold in particular for p = 0, and hence 
for the rank maximization problem (Pq)- They also hold for E = [s], i.e. for Problem (Pj',^'")- 
We recall that the principle of the greedy algorithm is to start from Qo = ^ ^-^d to construct 
sequentially the sets 

Qk+i ■= Qk U argmax^g£;\g^ Vp{Gk U {i}), 

until k = N. 



Theorem 2.6 (Approximability of Problem (Pp))- Let p G [0, 1]. The greedy algorithm always 
yields a solution within a factor 1 — - of the optimum of Problem (Pp). 

Proof. We know from Corollary 2.5 that for all p G [0, 1], ipp is submodular {p = corresponding 
to the rank maximization problem). In addition, the function (pp is nondecreasing, because 
X — > X^ is a matrix monotone function for p G [0, 1] (see e.g. [Zha02]) and v^p(0) = 0. 

Nemhauser, Wolsey and Fisher [NWF78] proved the result of this theorem for any non- 
decreasing submodular function / satisfying /(0) = which is maximized over a uniform 
matroid. Moreover when the maximal number of matrices which can be selected is N, this 
approximability ratio can be improved to 1 — (l — 1/A^) . □ 

Remark 2.7. One can obtain a better bound by considering the total curvature of a given 
instance, which is defined by: 

ie[s] y^pib}) 

Conforti and Cornuejols [CC84] proved that the greedy algorithm always achieves a 
^ (l — (1 — -^)^)-approximation factor for the maximization of an arbitrary nondecreasing sub- 
modular function with total curvature c. In particular, since cpi is additive it follows that the 
total curvature for p = 1 is c = 0, yielding an approximation factor of 1: 

lim l(l_(l_^)^)=l. 

As a consequence, the greedy algorithm always gives the optimal solution of the problem. Note 
that Problem (Pi) is nothing but a knapsack problem, for which it is well known that the greedy 
algorithm is optimal if each available item has the same weight. However, it is not possible 
to give an upper bound on the total curvature c for other values of p G [0, 1[, and c has to be 
computed for each instance. 

Remark 2.8. The problem of maximizing a nondecreasing submodular function subject to a 
budget constraint of the form ^^ Qnj < 5, where q > is the cost for selecting the element i 
and B is the total allowed budget, has been studied by several authors. Wolsey presented an 
adapted greedy algorithm [Wol82] with a proven approximation guarantee of 1 — e~^ ~ 0.35, 
where (3 is the unique root of the equation e^ = 2 — x. More recently, Sviridenko [Svi04] 
showed that the budgeted submodular maximization problem was still 1 — 1/e— approximable 
in polynomial time, with the help of an algorithm which associates the greedy with a partial 
enumeration of every solution of cardinality 3. 

We have attained so far an approximation factor of 1 — e~^ for all p G [0, 1[, while we have a 
guarantee of optimality of the greedy algorithm for p = 1. This leaves a feeling of mathematical 
dissatisfaction, since intuitively the problem should be easy when p is very close to 1. In the 
next section we remedy to this problem, by giving a rounding algorithm with an approximation 
factor F{j>) which depends on p, and such that p y-^ F{j>) is continuous, nondecreasing and 
limp^iF(p) = 1. 



3 Approximation by rounding algorithms 

The optimal design problem has a natural continuous relaxation which is simply obtained 
by removing the integer constraint on the design variable n, and has been extensively stud- 
ied [Atw73, DPZ08, YulO, Sagll]. As mentioned in the introduction, several authors proposed 
to solve this continuous relaxation and to round the solution to obtain a near-optimal discrete 
design. While this process is well understood when A^ > s, we are not aware of any bound 
justifying this technique in the underinstrumented situation N < s. 

3.1 A continuous relaxation 

The continuous relaxation of Problem (Pp) which we consider is obtained by replacing the 
integer variable n G N'^ by a continuous variable w in Problem (22): 

max ^JMf(w)) (3) 

w e(R+)'' 

T,kWk<N 

Note that the criterion (pp{w) is raised to the power 1/p in Problem (3) (we have 
^p{Mf(w)) = m~^^^(fp{wY^^ for p > 0). The limit of Problem (3) as p — ?■ 0^ is hence 
the maximization of the determinant of Mf{w) (cf. Equation (20)). 

We assume without loss of generality that the matrix Mp{l) = Ylk=i^^k ^^ '^^ ^^^^ rank 
(where 1 denotes the vector of all ones). This ensures the existence of a vector w which is 
feasible for Problem (3), and such that Mf{w) has full rank. If this is not the case (r* : = 
rank(Mj?(l)) < m), we define instead a projected version of the continuous relaxation: Let 
UTjU'^ be a singular value decomposition of MpiV). We denote by Ur* the matrix formed with 
the r* leading singular vectors of M^(l), i.e. the r* first columns of U . It can be seen that 
Problem (3) is equivalent to the problem with projected information matrices M^ '■= Uj*MkUr* 
(see Paragraph 7.3 in [Puk93]). 

The functions X h-)> log(det(X)) and X h-)> X^ {p g]0, 1]) are strictly concave on the interior 
of §^, so that the continuous relaxation (3) can be solved by interior-points technique or 
multiplicative algorithms [Atw73, DPZ08, YulO, Sagll]. The strict concavity of the objective 
function indicates in addition that Problem (3) admits a unique solution if and only if 



WiMi + W2M2 + ... + wMs = yiMi + 2/2M2 + . . . + VsMs ^{wu...,w,) = {y 



!■,■■■ lys), 

that is to say whenever the matrices Mj are linearly independent. In this paper, we focus on 
the rounding techniques only, and we assume that an optimal solution w* of the relaxation (3) 
is already known. In the sequel, we also denote a discrete solution of Problem {Pp) by n* and a 
binary solution of Problem (Pp'") by S* . Note that we always have i^piw*) > ippin*) > Lpp(S*). 

3.2 Posterior bounds 

In this section, we are going to bound from below the approximation ratio ipp{n) / (pp{w*) for 
an arbitrary discrete design n, and we propose a rounding algorithm which maximizes this 
approximation factor. The lower bound depends on the continuous optimal variable w*, and 
hence we refer it as a posterior bound. We start with a result for binary designs {Wi G [s], rii < 

10 



1), which we associate with a subset 5* of [s] as in Section 2. The proof rehes on several matrix 
inequahties and technical lemmas on the directional derivative of a scalar function applied to 
a symmetric matrix, and is therefore presented in Appendix B. 

Proposition 3.1. Let p & [0,1] and w* be optimal for the continuous relaxation (3) of Prob- 
lem (Pp). Then, for any subset S of [s], the following inequality holds: 



^j:("--'y-"^ 



^p{S) 



Remark 3.2. In this proposition and in the remaining of this article, we adopt the convention 
00 = 0. 

We point out that this proposition includes as a special case a result of Pukelsheim [PukSO], 
already generalized by Harman and Trnovska [HT09], who obtained: 

w* rank M,; 



A^ m 

i.e. the inequality of Proposition 3.1 for p = and a singleton 5* = {i}. However the proof is 
completely different in our case. Note that there is no constraint of the form Wj < 1 in the 
continuous relaxation (3), although the previous proposition relates to binary designs 5* G [s\. 
Proposition 3.1 suggests to select the N matrices with the largest coordinates w* to obtain a 
candidate S for optimality of the binary problem {P^™). We will give in the next section a 
prior bound (i.e., which does not depend on w*) for the efficiency of this rounded design. 

We can also extend the previous proposition to the case of replicated designs n eW (note 
that the following proposition does not require the design n to satisfy ^^ rij = A^): 

Proposition 3.3. Let p G [0, 1] and w* be optimal for the continuous relaxation (3) of Prob- 
lem (Pp). Then, for any design n G N'^, the following inequality holds: 



\^ ^P /'„,,* A i-p ^ ^P^ 



\<{w:f-'< 



n] 



Proof. We consider the problem in which the matrix Mi is replicated rii times: 

VzG [s], \/ke [nil Mi^k = Mi. 

Since w* is optimal for Problem (3), it is clear that (wj.A;)(j,fc)6u grsi{j}x[nj] is optimal for the 
problem with replicated matrices if 

^ie[s\,^Wi,k = w*i, (4) 

i.e. Wi^k is the part of w* allocated to the /c*^ copy of the matrix Mj. For such a vector. 
Proposition 3.1 shows that 
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Finally, it is easy to see (by concavity) that the latter lower bound is maximized with respect 
to the constraints of Equation (4) if Vi G [s] , Wk G [rii] , Wi^k 



"'i . 



/ \ -, S "i / ^:\ l~P 1 « 



-y 



j=i fc=i \ " y 1=1 

D 

We next give a simple rounding algorithm which finds the feasible design n which maximizes 
the lower bound of Proposition 3.3: 



max 

new 

Y,ni=N je[s 



J2 n", w] ". (5) 



The latter maximization problem is in fact a ressource allocation problem with a convex separable 
objective, and the incremental algorithm which we give below is well known in the resource 
allocation community (see e.g. [IK88]). 



Algorithm 3.1 [Incremental rounding 



Input: A nonnegative vector w; G M'' such that Ylt=i ^i = N E'N\ {0}. 
Sort the coordinates of w; We assume wlog that Wi > W2 > ■ ■ ■ > Wg] 
n^ [1,0...,0] gM' 
for A; = 2 ... A do 

Select an index i^ax € argmax ((rij + 1)^ — nf) w^~^ 

i&ls] 

Uj ■(— n,- +1 

''max I'-max ' 

end for 

return: a A— replicated design n which maximizes ^l=in\w^~^ . 

Remark 3.4. If w is sorted {wi > W2 > ■ ■ ■ > Wg), then the solution of Problem (5) clearly 
satisfies ni > ^2 > . . . > n^. Consequently, it is not necessary to test every index i G [s] 
to compute the argmax in Algorithm 3.1. Instead, one only needs to compute the increments 
((nj + 1)^ — n^) Wi~^ for the i G [s] such that z = 1 or rij + 1 < nj_i. 

We shall now give a posterior bound for the budgeted problem {P^'^^)- We only provide a 
sketch of the proof, since the reasoning is the same as for Propositions 3.1 and 3.3. We also 
point out that the approximation bound provided in the next proposition can be maximized 
over the set of 5— budgeted designs, thanks to a dynamic programming algorithm which we do 
not detail here (see [MM76]). 

Proposition 3.5. Let p G [0, 1] and w* be optimal for the continuous relaxation 



max < 



%{Mf{w)) : w>0, J^QWi <b\ (6) 



tels 



of Problem (P^ ^). Then, for any design n G N*, the following inequality holds: 



R E ^^< 



'^*Y^v<: Mn) 



ie[s\ ^P^ 
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Proof. First note that after the change of variable Zi := NB~^CiWi, the continuous relax- 
ation (6) can be rewritten under the standard form (3), where the matrix Mi is replaced by 
Ml = B{Nci)~^Mi. Hence, we know from Proposition B.4 that the optimality conditions of 
Problem (6) are: 

Vi G [s], BcrHTace{MFiw*y-^Mi) < ipp{w*), 

with inequality if w* > 0. Then, we can apply exactly the same reasonning as in the proof of 
Proposition 3.1, to show that 



ySc[s], l^c,«)i-P< 



^p{S) 



The only change is that the optimality conditions must be multiplied by a factor proportional 
to Ci{w\Y~^ (instead of {w*Y~'^ as in Equation (25)). Finally, we can apply the same arguments 
as in the proof of 3.3 to obtain the inequality of this proposition. D 

3.3 Prior bounds 

In this section, we derive 'prior bounds for the solution obtained by rounding the continuous 
solution of Problem (3), i.e. approximation bounds which depend only on the parameters p, N 
and s of Problems (Pp) and {Pp^'^^). We first need to state one technical lemma. 

Lemma 3.6. Let w ^ W be a nonnegative vector summing to r < s, r ^ N, and p be an 
arbitrary real in the interval [0, 1]. Assume without loss of generality that the coordinates of w 
are sorted, i.e. Wi > . . . > Wg > 0. If one of the following two conditions holds: 

{{) Vi G [s], Wi < 1; 

Inr 

{ti) p<l- - — , 
ms 

then, the following inequality holds: 

Proof. We start by showing the lemma under the condition (z). To this end, we consider the 
minimization problem 

r s 

min{N Wi'"'- / Wi = T] 1 > Wi > . . . > w^ > 0}. (7) 

2=1 i=l 

Our first claim is that the optimum is necessarily attained by a vector of the form w = 
[u + «!, . . . ,u + ar,u, . . . , u]'^, where ai, . . . ,ar > 0, i.e. the s — r coordinates of w which are 
not involved in the objective function are equal. To see this, assume ad absurbium that w is 
optimal for Problem (7), with Wi > Wj+i for an index i > r. Define k as the smallest integer 
such that Wi = W2 = . ■ ■ = Wk > Wk+i- Then, Cj — 1/k X^jeffcl ^j ^^ ^ feasible direction along 



13 



which the objective criterion Yll=i '^i ^ is decreasing, a contradiction. Problem (7) is hence 
equivalent to: 



min-j 



iin{ V(m + aiY'P : V a^ = r - sm; < m; < a^ < 1 - m (Vi G [r])}. (8) 

i=\ i=\ 

It is known that the objective criterion of Problem (8) is Schur-concave, as a symmetric sepa- 
rable sum of concave functions (we refer the reader to the book of Marshall and Olkin [M079] 
for details about the theory of majorization and Schur-concavity) . This tells us that for all 
u G [0, -], the minimum with respect to ol is attained by 



[1 — M, . . . , 1 — M,r — SM — k(\ — m), 0, . . . , 0] 



k times 



where k = [^^J (^^ ^ given u, this vector majorizes all the vectors of the feasible set). 
Problem (8) can thus be reduced to the scalar minimization problem 



mm 



r — su 
1-n 



+ [u + r — su — 



r — su 
1 -u 



(1-^)) +( 



r — su 
l-u 



ly-p. 



It is not difficult to see that this function is piecewise concave, on the r — 1 intervals of the form 



u e 



-(fc+i) 



s-(fc+l)' s-k 



k E [r — 1], corresponding to the domains where k = [j^] is constant. It 

follows that the minimum is attained for a m of the form j^, where k E [r], and the problem 
reduces to 

min A; + (r — A;) I 

ke[r] \s - k 

Finally, one can check that the objective function of the latter problem is nondecreasing with 
respect to k, such that the minimum is attained for k = (which corresponds to the uniform 
weight vector w = [r/s, . . . , r/s]'^). This achieves the first part of this proof. 

The proof of the lemma for the condition (ii) is similar. This time, we consider the mini- 
mization problem 

r s 

min{ > Wi~^ '■ / Wj = r; U7i > . . . > Ws > 0}. (9) 

i=l i=l 

Again, the optimum is attained by a vector of the form w = [u + ai, . . . ,u + ar,u, . . . , u]"^, 
which reduces the problem to: 

r r 

min{N (u + aiY^^: > ai = r — su; m, cti, . . . , a^ > 0}. (10) 

u,ct •^— ' ■^— ' 

i=l 4=1 

For a fixed u, the Schur-concavity of the objective function indicates that the minimum is 
attained for a. = [r — su, 0, . . . , 0]"^. Finally, Problem (10) reduces to the scalar minimization 
problem 

min (u+ (r — su)) + (r — 1)m^~^, 
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where the optimum is always attained for -u = or m = r/s by concavity. It now is easy to see 
that the inequahty of the lemma is satisfied when the latter minimum is attained for u = r/s, 
i.e. if r{-Y~P < r^~P, which is equivalent to the condition (ii) of the lemma. D 

As a direct consequence of this lemma, we obtain a prior approximation bound for Prob- 
lem (Pp") when p is in a neighborhood of 0. 

Theorem 3.7 (Approximation bound for A^— binary designs obtained by rounding). Let p G 

[0, 1], N < s and w* be a solution of the continuous optimal design problem (3). Let S be the 

N— binary design obtained by selecting the N largest coordinates of w* . If p < 1 — ^j^, then 

we have 

ip,{S) ^ MS) ^ fNy-p 



cpp{s*) iPpC^*) ^ s 

Proof. This is straightforward if we combine the result of Proposition 3.1 and the one of 
Lemma 3.6 for r = N and condition (ii). D 

In the next theorem, we give an approximation factor for the design provided by Algo- 
rithm 3.1. This factor F is plotted as a function of p and the ratio — on Figure 1. For every 
value of — , this theorem shows that there is a continuously increasing difficulty from the easy 
case (p = 1, where F = 1) to the most degenerate problem (p = 0, where F = min(y, 1 — ^)). 

Theorem 3.8 (Approximation bound for A^— replicated designs obtained by rounding). Let 
p e [0,1], w* be a solution of the continuous optimal design problem (3) and n be the vector 
returned by Algorithm 3.1 for the input w = w* . Then, we have 



ipp[n-) ipp[w- 



where F is defined by: 



i^Y'" ^f (f )'"' < 2^ (m part^cular, z/ f < e"^; 



[1 — p) I 2^— ) *" Otherwise (in particular, if ^ > 



Proof. For all i G [s] we denote by /j := w* — [w*\ the fractional part of w*, and we assume 
without loss of generality that these numbers are sorted, i.e. , /i > /2 > • • • > /«• We will 
prove the theorem through a simple (suboptimal) rounding n, which we define as follows: 



rii 



[w*\+l if^<Ar-E.gHK*J 
[w*\ Otherwise. 



We know from Proposition 3.3 and from the fact that Algorithm 3.1 solves Problem (5) the 
integer vector n satisfies 



(Pp{n) 

1=1 i=l 



^^^^ ^ E<«)"' ^ E <«)"'■ (ii: 
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(b) 



Figure 1: Approximation factor F of Theorem 3.8: (a) as a function of p and the ratio — (log 
scale); (b) as a function of p for selected values of — . 



We shall now bound from below the latter expression: if Ui = \_w*\ and \^w*\ ^ 0, then 



n'M^)'-" 



L<J 



Wi 



KJ 



> KJ- 



:i2i 



Note that Inequality (12) also holds if n^ = \w\\ = 0. If rii = [w*\ + 1, we write 



<K)^-^ 



> 1^-^ + ■ ■ ■ + I'z+fi = KJ + /^^ (13) 



i-p 



[ui*J terms 



rii terms 



where the inequality is a consequence of the concavity of lu i— )■ ^ Wj ^ . Combining Inequali 
ties (12) and (13) yields 



s s ^-ELiKJ N-N 

E <K)^-^ > E KJ + E //"' = ^ + E /^'' 

i=l 1=1 j=l j=l 

where we have set N := X]i=iK*J ^ {max(A^ — s + 1, 0), . . . , A^}. Since the vector / = 
[/i; • • • ; fs] sums to N ~ N , we can apply the result of Lemma 3.6 with condition (z), with 
r = N — N, and we obtain 



E<^l"''>^+(^-^) ( 



> mm u + [N — uj \ 

S J u£[0,N] V S 



We will compute this lower bound in closed-form, which will provide the approximation 
bound of the theorem. To do this, we define the function g : u ^-^ u + (N — u) (^^^7^) on 
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] — oo, A^], and we observe (by differentiating) that g is decreasing on ] — oo, u*] and increasing 

on [m*, A^[, where 

1 
1 \ ^-p 

u* = N-s' 



Hence, only two cases can appear: either u* < 0, and the minimum of g over [0,N] is attained 
for u = 0; or u* > 0, and g\[o,N] attains its minimum ai u = u*. Finally, the bound given in 
this theorem is either N~^g{0) or N~^g{u*), depending on the sign of u*. In particular, since 
the function 

is nonincreasing on the interval [0, 1], with h{0) = | and h{l) = e~^, we have: 

Vj9 e [0, 11, — < e"^ ^ M* < and — > - ^ m* > 0. 
s s 2 

n 

Remark 3.9. The alternative rounding h is very useful to obtain the formula of Theorem 3.8. 
However, since n differs from the design n returned by Algorithm 3.1 in general, the inequality 
(^L > F is not tight. Consider for example the situation where p = and N = s, which is a 
trivial case for the rank optimization problem (Pq): the incremental rounding algorithm always 
returns a design n such that {w* > ^ rij > 0), and hence the problem is solve to optimality 
(the design is of full rank). In contrast. Theorem 3.8 only guarantees a factor F = ^ for this 
class of instances. 

Remark 3.10. We point out that Theorem 3.8 improves on the greedy approximation factor 
1 — e~^ in many situations. The gray area of Figure 2 shows the values of ( — , p) G M.*^ x [0, 1] for 
which the approximation guarantee is better with Algorithm 3.1 than with the greedy algorithm 
of section 2. 

Remark 3.11. Recall that the relevant criterion for the theory of optimal design is the positively 
homogeneous function w 1— )■ ^p(^Mf{w)^ = m~^^P(fp{wY^P (cf. Equation (20)). Hence, if a 
design is within a factor F of the optimum with respect to (fp, its $p— efficiency is F^^^. In the 
overinstrumented case N > s, Pukelsheim gives a rounding procedure with a $p— efficiency of 
1 — ^ (Chapter 12 in [Puk93]). We have plotted in Figure 3 the area of the domain {j^,p) G 
[0, 1]^ where the approximation guarantee of Theorem 3.8 is better. 



4 Conclusion 

This paper gives bounds on the behavior of some classical heuristics used for combinatorial 
problems arising in optimal experimental design. Our results can either justify or discard 
the use of such heuristics, depending on the settings of the instances considered. Moreover, 
our results con&m some facts that had been observed in the literature, namely that rounding 
algorithms perform better if the density of measurements is high, and that the greedy algorithm 
always gives a quite good solution. We illustrate these observations with two examples: 
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Figure 2: in gray, values of ( — ,p) G M!^ x [0, 1] such that the factor F of Theorem 3.8 is larger 
than 1 — e~^. 




Figure 3: in gray, values of {jf,p) G [0, 1]^ such that the factor F of Theorem 3.8 is larger than 

(1 - s/N)P. 



In a sensor location problem, Ucinski and Patan [UP07] noticed that the trimming of a 
Branch and Bound algorithm was better if they activated more sensors, although this led to 
a much larger search space. The authors claims that this surprising result can be explained 
by the fact that a higher density of sensors leads to a better continuous relaxation. This is 
confirmed by our result of approximability, which shows that the larger is the number of selected 
experiments, the better is the quality of the rounding. 

It is also known that the greedy algorithm generally gives very good results for the optimal 
design of experiments (see e.g. [SQZ06], where the authors explicitly chose not to implement a 
local search from the design greedily chosen, since the greedy algorithm already performs very 
well). Our (1 — 1/e)— approximability result guarantees that this algorithm always well behaves 
indeed. 
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Appendix 



A From optimal design of statistical experiments to 
Problem (Pp) 

A.l The classical linear model 

We denote vectors by bold-face lowercase letters and we make use of the classical notation 
[s] :={!,..., s} (and we define [0] := 0). The set of nonnegative (resp. positive) real numbers 
is denoted by IR+ (resp. M+), the set oimxm symmetric (resp. symmetric positive semidefinite, 
symmetric positive definite) is denoted by Sm (resp. §^, S^). The expected value of a random 
variable X is denoted by E[X]. 

We denote by G M™ the vector of the parameters that we want to estimate. In accordance 
with the classical linear model, we assume that the experimenter has a collection of s exper- 
iments at his disposal, each one providing a (multidimensional) observation which is a linear 
combination of the parameters, up to a noise on the measurement whose covariance matrix is 
known and positive definite. In other words, for each experiment i E [s], we have 



Vi 



Ai0 + Si, E[ei 



0, 



EleiSi 



(14) 



where yi is the vector of measurement of size /j, Ai is a (/, x ?7i)— matrix, and Ej G §^^ 
is a known covariance matrix. We will assume that the noises have unit variance for the 
sake of simplicity: Sj = /. We may always reduce to this case by a left multiplication of 
the observation equation (14) by S^ . The errors on the measurements are assumed to be 
mutually independent, i.e. 

t^j^ E[eie/] = 0. 

As explained in the introduction, the aim of experimental design theory is to choose how 
many times each experiment will be performed so as to maximize the accuracy of the estimation 
of 0, with the constraint that A^ experiments may be conducted. We therefore define the 
integer- valued design variable n E W, where n^ indicates how many times the experiment k 
is performed. We denote by ik G [s] the index of the k^^ conducted experiment (the order 
in which we consider the measurements has no importance), so that the aggregated vector of 
observation reads: 

y = A6 + e, 



(15) 



where y 



y 



«i 



y 



2JV 



A 



A 



In 



E[e] = 0, and E[ee^] = /. 



Now, assume that we have enough measurements, so that A is of full rank. A common 
result in the field of statistics, known as the Gauss-Markov theorem, states that the best linear 
unbiased estimator of 6 is given by a pseudo inverse formula. Its variance is given below: 



A^A) A^y. 



e 



Var(^) = {A^A)-\ 



(16) 
(17) 
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We denote the inverse of the covariance matrix (17) by Mf{ti), because in the Gaussian 
case it coincides with the Fisher information matrix of the measurements. Note that it can be 
decomposed as the sum of the information matrices of the selected experiments: 

Mpin) = A^A 

N 



j:aia 



k=l 
s 



5^n,AfAi. 



«=1 



The classical experimental design approach consists in choosing the design n in order to make 
the variance of the estimator (16) as small as possible. The interpretation is straightforward: 
with the assumption that the noise e is normally distributed, for every probability level a, the 
estimator 6 lies in the confidence ellipsoid centered at 6 and defined by the following inequality: 

{e-6fQi6-e)<t,^, (19) 

where Ka depends on the specified probability level, and Q = Mp{n) is the inverse of the 
covariance matrix Vai{6). We would like to make these confidence ellipsoids as small as possible, 
in order to reduce the uncertainty on the estimation of 6. To this end, we can express the 
inclusion of ellipsoids in terms of matrix inequalities. The space of symmetric matrices is 
equipped with the Lowner ordering, which is defined by 

Let n and n' denote two designs such that the matrices Mpln) and Mp{n') are invertible. 
One can readily check that for any value of the probability level a, the confidence ellipsoid (19) 
corresponding to Q = Mpln) is included in the confidence ellipsoid corresponding to Q' = 
MF{n') if and only if Mp^n) >z Mp{n'). Hence, we will prefer the design n to the design n' 
if the latter inequality is satisfied. 

A. 2 Statement of the optimization problem 

Since the Lowner ordering on symmetric matrices is only a partial ordering, the problem con- 
sisting in maximizing Mpln) is ill-posed. So we will rather maximize a scalar information 
function of the Fisher matrix, i.e. a function mapping S+ onto the real line, and which satisfies 
natural properties, such as positive homogeneity, monotonicity with respect to Lowner order- 
ing, and concavity. For a more detailed description of the information functions, the reader is 
referred to the book of Pukelsheim [Puk93], who makes use of the class of matrix means $p, 
as first proposed by Kiefer [Kie75]. These functions are defined like the Lp-norm of the vector 
of eigenvalues of the Fisher information matrix, but for p G [— oo, 1]: for a symmetric positive 
definite matrix M e S^^, $p is defined by 

{Amm(M) for p = -oo ; 

(^ trace MP)p for p G ] - cx), 1], p ^ 0; (20) 

(det(M))^ iorp = 0, 
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where we have used the extended definition of powers of matrices M^ for arbitrary real parame- 
ters p: if Ai, . . . , Am are the eigenvalues of M counted with multiplicities, trace Af = X^jLi -^j- 
For singular positive semi-definite matrices M G S^^ ^p i^ defined by continuity: 

, , f for » G [-00, 01 : , , 

^^ ' \(^ trace MP)p forpG]0,l]. ^ ' 

The class of functions $p includes as special cases the classical optimality criteria used in 
the experimental design literature, namely E'— optimality for p = —00 (smallest eigenvalue 
of Mp{n)), D— optimality for p = (determinant of the information matrix), A— optimality 
for p = — 1 (harmonic average of the eigenvalues), and T— optimality for p = 1 (trace). The 
case p = (D-optimal design) admits a simple geometric interpretation: the volume of the 
confidence ellipsoid (19) is given by Cmi^^ det((5)~^/^ where Cm > is a constant depending 
only on the dimension. Hence, maximizing ^Q{Mp{n)) is the same as minimizing the volume 
of every confidence ellipsoid. 

We can finally give a mathematical formulation to the problem of selecting A^ experiments 
to conduct among the set [s\: 



s 

max $« f V riiAjAi) (22) 

1=1 

s.t. ^ni<N, 



A. 3 The underinstrumented situation 

We note that the problem of maximizing the information matrix Mp{n) with respect to the 
Lowner ordering remains meaningful even when Mp{n) is not of full rank (the interpretation 
oiMpin) as the inverse of the covariance matrix of the best linear unbiased estimator vanishes, 
but Mp{n) is still the Fisher information matrix of the experiments if the measurement errors 
are Gaussian). This case does arise in underinstrumented situations, in which some constraints 
may not allow one to conduct a number of experiments which is sufficient to infer all the 
parameters. 

An interesting and natural idea to find an optimal under-instrumented design is to choose the 
design which maximizes the rank of the observation matrix A, or equivalently of Mp{ri) = A'^A. 
The rank maximization is a nice combinatorial problem, where we are looking for a subset of 
matrices whose sum is of maximal rank: 



max rank ( > UjAJA, 

i 

S.t. ^ni<N. 



When every feasible information matrix is singular. Equation (21) indicates that the max- 
imization of ^p{Mp{n)) can be considered only for nonnegative values of p. Then, the next 
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proposition shows that $p can be seen as a deformation of the rank criterion for p g]0, 1]. First 
notice that when p > 0, the maximization of $p(Mi?(n)) is equivalent to: 



max 



fp[rij := trace ( y UjAJAj ) 

i 

s. t. ^ni< N. 



If we set Mi = AjAi, we obtain the problems (Pq) and (Pp) which were presented in the first 
lines of this article. 

Proposition A.l. For all positive semidefinite matrix M G §^, 

lim trace M^ = rank M. (23) 

P-5-0+ 

Proof. Let Ai, . . . , A^ denote the positive eigenvalues of M, counted with multiplicities, so that 
r is the rank of M. We have the first order expansion as p — )■ 0"^: 

r r 

trace M^ = ^ A^ = r + p log( JJ A^) + 0{p^) (24) 

fc=l k=l 

D 

Consequently, trace M° will stand for rank(M) in the sequel and the rank maximization 
problem (Pq) is the limit of problem (P^) as j9 — > O"*". 

Corollary A. 2. If p > is small enough, then every design n* which is a solution of Prob- 
lem (Pp) maximizes the rank of Mp{n). Moreover, among the designs which maximize this 
rank, n* maximizes the product of nonzero eigenvalues of Mp{n). 

Proof. Since there is only a finite number of designs, it follows from (24) that for p > small 
enough, every design which maximizes (pp must maximize in the lexicographical order first the 
rank of Mp{n), and then the pseudo-determinant n|fc-Afc>o} ^k- Q 

B Proof of Proposition 3.1 

The proof of Proposition 3.1 relies on several lemmas on the directional derivative of a scalar 
function applied to a symmetric matrix, which we state next. First recall that if / is differen- 
tiable on W^, then / is Frechet differentiable over §^^, and for M G Sm"*", -f/^ G Sm, we denote 
by Df{M){H) its directional derivative at M in the direction of H (see Equation (2)). 

Lemma B.l. /// is continuously differentiable on R*^, i.e. f G C^(]R;^), M G §++,y4,P G Sm, 
then 

trace(A Df{M){B)) = trace(P Df{M){A)). 



26 



Proof. Let M = QDQ^ be an eigenvalue decomposition of M. It is known (see e.g. [Bha97]) 
that Df{M){H) can be expressed as Q{f^\D) Q^HQ)Q^, where f^^^{D) is a symmetric 
matrix called the first divided difference of / at D and denotes the Hadamard (elementwise) 
product of matrices. With little work, the latter derivative may be rewritten as: 



«j 



where q^ is the k^^ eigenvector of M (i.e., the k^^ column of Q) and /j- denotes the 
(i,j)— element of f^^'{D). We can now conclude: 

trace(A Df{M){B)) = J^ flf ^^^^^^qiq^^ Bq^qj^) 






Y^ fjl tTace{Bqjqj'^Hqiqi 



= tTace{B Df{M){A)) 

n 

We next show that when / is antitone, the mapping X i— )■ Df {M){X) is nonincreasing with 
respect to the Lowner ordering. 

Lemma B.2. // / is differentiahle and antitone on R^, then for all A, B in B>ra, 

A^B^ Df{M){A) >z Df{M){B). 

Proof. The lemma trivially follows from the definition of the directional derivative: 

DfiM)iA)=\im-^{fiM + eA)-f{M)) 

and the fact that A^ B implies M + eA ^ M + eB hr all e > 0. D 

Lemma B.3. Let f be differentiahle on M.\, M G S^,^, A G §m- If ^ o,nd M commute, then 

Df{M){A) = f'{M)AeSm, 
where f denotes the (scalar) derivative of f . 
Proof. Since A and M commute, we can diagonalize them simultaneously: 

M = Q Diag(A)Q^, A = Q Diag{tj,)Q^. 
Thus, it is clear from the definition of the directional derivative that 

Df{M){A) = Q D/(Diag(A))(Diag(Ai)) Q^ . 
By reasoning entry-wise on the diagonal matrices, we find: 

D/(Diag(A)) (Diag(Ai)) = Diag (/'(Ai)/ii, . . . , /'(A^)/i„) = Diag (/'(A)) Diag(^) 
The equality of the lemma is finally obtained by writing: 

Df{M){A) = QDiag (/'(A)) Diag(M)g^ = QDiag (/'(A))Q'^QDiag(^)Q^ = f\M)A. 
Note that the matrix f'{M)A is indeed symmetric, because f'{M) and A commute. D 
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Before we give the proof of the main result, we recall an important result from the theory 
of optimal experimental designs, which characterizes the optimum of Problem (3). 

Proposition B.4 (General equivalence theorem [Kie74]). Letp G [0, 1]. A design w* is optimal 
for Problem (3) if and only if: 

yi e [s], NtTace{MFiw*)P-^Mi) < ipp{w*). 

Moreover, the latter inequalities become equalities for all i such that w* > 0. 

For a proof of this result, see [Kie74] or Paragraph 7.19 in [Puk93], where the problem is 
studied with the normalized constraint Yli'^i — 1- I^ fact, the general equivalence theorem 
details the Karush-Kuhn- Tucker conditions of optimality of Problem (3). To derive them, one 
can use the fact that when Mf{w) is invertible, 

^^^^ = tTace{Mp{wy-'Mi) for all pG]0,l], 

and 

— ^^ ^ " = tr ace(MF(w)-^ Mi). 

OWi 

Note that for p ^ 1, the proposition implicitly implies that Mf{w*) is invertible. A proof of 
this fact can be found in Paragraph 7.13 of [Puk93]. 
We can finally prove the main result: 

Proof of Proposition 3.1. Let w* be an optimal solution to Problem (3) and 5* be a subset 
of [s] such that w* > for all z G 5* (the case in which w* = for some index i E S 
will trivially follow if we adopt the convention 0° = 0). We know from Proposition B.4 that 
N~^iPp(^w*^ = trace(M^(w;*)P~^Mj) for all i in 5*. If we combine these equalities by multiplying 
each expression by a factor proportional to {w*Y~^, we obtain: 

1^,(^*) = E V ^""t^V-v ^^^^<MF{wr-'Mi) (25) 

, , 1 V-, .,1-r, E.e5«)'""trace(Mp(^*riM,: 



We are going to show that for aA\ w > such that Mf{w) is invertible, 
^.g5W-"^trace(Mir(i(;)P-^Mi) < trace(M5)P, where Ms := Y^ies^i^ which will complete 
the proof. To do this, we introduce the function / defined on the open subset of {^+Y such 
that Mf{w) is invertible by: 

f{w) = Ew,^-Ptrace(Mir(i(;)P-^Mi) = trace j (j^^l'^^i)^^^'^)'''^ 

ie5 \ i€S / 

Note that / satisfies the property f{tw) = f{w) for all positive scalar t; this explains why we 
do not have to work with normalized designs such that ^^ Wi = N. Now, let it> > be such 
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that Mf{w) >- and let k be an index of 5* such that Wk = minjg^Wj. We are first going to 

By the rule of differentiation of a product, 



Q^ trace (1 - p)w-/MkMF{wY-^ + ( J^ wl'^'M-^ ^^^ 

\ i&S 

= trace ( {I - p)wl'"MkMp{wy-^ + (^ w;,^"^Mi)D[x ^ xP"^](M^(io))(Mfc 



i&S 



= trace Mfc [ (1 - p)w~^Mf(io)p-^ + D[x ^ a;^"^] (Mf(io)) ( ^u/^^-^Mi)) j , (26) 

where the first equality is simply a rewriting of q^ — by using a directional derivative, 
and the second equality follows from Lemma B.l applied to the function x i— )■ x^~^. By linearity 
of the Frechet derivative, we have: 

wl D[x ^ xP-'] (AMw)) (Y.^l'^'Mi) = D[x ^ xP-'] {Mf{w)) {J^^i (—) Mi). 

iGS i€S ^^'^ 

Since Wk < Wi for all i & S, the following matrix inequality holds: 

By applying successively Lemma B.2 {x i— )■ x^~^ is antitone on M^) and Lemma B.3 (the matrix 
Mp{w) commutes with itself), we obtain: 

wl D[x ^ xP-^]{Mf{w)){J2wI~^M,) y D[x ^ xP-^]{Mf{w)){Mf{w)) 



i£S 



= {p- l)MFiwf~^MFiw) 

= {p-i)MF{wy~\ 

Dividing the previous matrix inequality by w^., we find that the matrix that is inside the largest 
parenthesis of Equation (26) is positive semidefinite, from which we can conclude: g^' > 0. 

Thanks to this property, we next show that f{w) < f{v), where t; G M* is defined by Vi = 
maxfcg5(wfc) if i e S" and Vi = Wi otherwise. Assume without loss of generality (after a reordering 
of the coordinates) that S = [sq] , Wi < W2 < ■ ■ ■ < Wg^ , and denote the vector of the remaining 
components of it> by iD (i.e., we have w'^ = [wi, . . . , Ws^, w] and v'^ = [Wsq, ■ ■ ■ , w^q, w]). The 
following inequalities hold: 



/M = / 
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The first inequality holds because g^' > as long as wi < W2. To see that the second inequal- 
ity holds, we apply the same reasoning on the function / : [102,103, . . .] h-> f {[102,102,103, . . .]), 
i.e., we consider a variant of the problem where the matrices Mi and M2 have been replaced 
by a single matrix Mi + M2. The following inequalities are obtained in a similar manner. 
Recall that we have set Ms = Ylii<^s^i- ^^ have: 

Mf{v) = Ws,Ms + J2^^Mi h WsoMs 

and by isotonicity of the mapping x h-). x^^^, Mp{vy^'^ >z (ws„ MsY^^. 

We denote by X^ the Moore-Penrose inverse of X. It is known [PS83] that if Mi G S^, the 
function X i— )■ trace(X'''Mj) is nondecreasing with respect to the Lowner ordering over the set 
of matrices X whose range contains Mj. Hence, since Mf{v) >z Mp{w) is invertible, 

Vz G S, tTace{MF{vY~^Mi) = trace ('(M^(i;)^"P)"^Mi) < trace ({{ws,, MsY'^y M^ 

and 

f{v) = wl;^'J2^Ta.ce{MF{vr-'M,) 

ies 

< t.l/J^trace ((K„ Msf-^Y M?j 

ies 

= trace {Mg'^y Ms 
= trace Mg 

Finally, we have f{w) < f{v) < trace M| = fp{S), and the proof is complete. D 
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